Exploring Motion Boundary based Sampling and Spatial-Temporal Context Descriptors for Action Recognition
نویسندگان
چکیده
The most important problem in action recognition is how to represent an action video. The approaches can be roughly divided into four categories: (1) human pose based approaches which utilize human structure information; (2) global action template based approaches which capture appearance and motion information on the whole motion body; (3) local feature based approaches which mainly extract valid space-time cuboids; (4) unsupervised feature learning based methods which learn the representation by hierarchical networks. Among these approaches, local feature with bag-of-features (BoF) framework is perhaps the most popular way for action recognition. With the mentioned popular pipeline, Wang et al. [2] proposed dense trajectory (DT) based features for action video representation and achieved state-of-the-art performance on several action datasets recently. Though its great power, the DT method is expensive in memory storage and computation due to the large number of dense sampled points. In this paper, we improve the DT method in two folds. Firstly, we introduce a motion boundary based dense sampling strategy, called DT-MB, which greatly reduces the number of valid trajectories while preserves the discriminative power. Secondly, we develop a set of co-occurrence descriptors which describe the spatial-temporal context of motion trajectories. Our DT-MB is partly implied by MBH descriptor [2] and motion boundary contour system (BCS) in neural dynamics of motion perception [1]. It constrains the sampled points on large magnitude regions of motion boundary image in the sampling step. A comparison with original DT method is illustrated in Fig.1. Our sampling approach removes a large number of points which are not on the motion foreground. We propose spatial-temporal co-occurrence HOG, HOF and MBH to further enhance the performance of DT. The pipeline of spatial cooccurrence feature in a regularized spatial-temporal grid is shown in Fig.2, and the temporal one is depicted in Fig.3. The spatial co-occurrence HOG [3], HOF and MBH aim to capture complex spatial structures of appearance and motion. Our novel temporal co-occurrence descriptors depict clear motion and appearance changes from successive patches. Our results of individual co-occurrence descriptors on three datasets are illustrated in Fig.4. It indicates that temporal context information for pure spatial feature is more effective, and spatial context information for Figure 2: An example of spatial co-occurrence features with grid of size nσ ×nσ ×nτ .
منابع مشابه
Motion boundary based sampling and 3D co-occurrence descriptors for action recognition
Recent studies witness the success of Bag-of-Features (BoF) frameworks for video based human action recognition. The detection and description of local interest regions are two fundamental problems in BoF framework. In this paper, we propose a motion boundary based sampling strategy and spatialtemporal (3D) co-occurrence descriptors for action video representation and recognition. Our sampling ...
متن کاملActions As Objects: A Novel Action Representation
In this paper, we propose to model an action based on both the shape and the motion of the object performing the action. When the object performs an action in 3D, the points on the outer boundary of the object are projected as 2D (x, y) contour in the image plane. A sequence of such 2D contours with respect to time generates a spatiotemporal volume (STV) in (x, y, t), which can be treated as 3D...
متن کاملRecognition of Visual Events using Spatio-Temporal Information of the Video Signal
Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...
متن کاملMotion Boundary Trajectory for Human Action Recognition
In this paper, we propose a novel approach to extract local descriptors of a video, based on two ideas, one using motion boundary between objects, and, second, the resulting motion boundary trajectories extracted from videos, together with other local descriptors in the neighbourhood of the extracted motion boundary trajectories, histogram of oriented gradients, histogram of optical flow, motio...
متن کاملStudy of Human Action Recognition Based on Improved Spatio-temporal Features
Most of the existed action recognition methods mainly utilize spatio-temporal descriptors of single interest point ignoring their potential integral information, such as spatial distribution information. By combining local spatio-temporal feature and global positional distribution information (PDI) of interest points,a novel motion descriptor is proposed in this paper. The proposed method detec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013